Provably Optimal Algorithms for Generalized Linear Contextual Bandits

ثبت نشده
چکیده

(✓ 1 ✓ 2 ) := F ( ̄ ✓)(✓ 1 ✓ 2 ) (17) Since μ̇ > 0 and min (V ) > 0, we have (✓ 1 ✓ 2 ) 0 (G(✓ 1 ) G(✓ 2 )) (✓ 1 ✓ 2 ) 0 (V )(✓ 1 ✓ 2 ) > 0 for any ✓ 1 6= ✓ 2 . Hence, G(✓) is an injection from Rd to Rd, and so G 1 is a well-defined function. Consequently, (15) has a unique solution ˆ ✓ = G (Z). Let us consider an ⌘-neighborhood of ✓⇤, B⌘ := {✓ : k✓ ✓⇤k  ⌘}, where ⌘ > 0 is a constant that will be specified later. Note that B⌘ is a convex set, thus ̄ ✓ 2 B⌘ as long as ✓1, ✓2 2 B⌘ . Define ⌘ := inf✓2B⌘ μ̇(x0✓) > 0. From (17), for any ✓ 2 B⌘ , kG(✓)k2V 1 = kG(✓) G(✓ ⇤ )k2V 1 = (✓ ✓⇤)0F ( ̄ ✓)V 1F ( ̄ ✓)(✓ ✓⇤) 2⌘ min(V ) k✓ ✓⇤k 2 , where the last inequality is due to the fact that F ( ̄ ✓) ⌫ ⌘V . On the other hand, Lemma A of Chen et al. (1999) implies that

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Provably Optimal Algorithms for Generalized Linear Contextual Bandits

Contextual bandits are widely used in Internet services from news recommendation to advertising, and to Web search. Generalized linear models (logistical regression in particular) have demonstrated stronger performance than linear models in many applications where rewards are binary. However, most theoretical analyses on contextual bandits so far are on linear bandits. In this work, we propose ...

متن کامل

Fairness in Learning: Classic and Contextual Bandits

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition demands that, given a pool of applicants, a worse applicant is never favored over a better one, despite a learning algorithm’s uncertainty over the true payoffs. In the classic stochastic bandits problem we provide a provably fair algorithm based on “chained” confidence intervals, and prove a cumulative r...

متن کامل

Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of t...

متن کامل

Linear Contextual Bandits with Knapsacks

We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesn’t exceed the budget for each ...

متن کامل

Linear Contextual Bandits with Global Constraints and Objective

We consider the linear contextual bandit problem with global convex constraints and a concaveobjective function. In each round, the outcome of pulling an arm is a vector, that depends linearly onthe context of that arm. The global constraints require the average of these vectors to lie in a certainconvex set. The objective is a concave function of this average vector. This probl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017